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Abstract —Analysis of sequential event data has been rec¬ 
ognized as one of the essential tools in data modeling and 
analysis field. In this paper, after the examination of its technical 
requirements and issues to model complex but practical situation, 
we propose a new sequential data model, dubbed Duration and 
Interval Hidden Markov Model (DI-HMM), that efficiently repre¬ 
sents “state duration” and “state interval” of data events. This has 
significant implications to play an important role in representing 
practical time-series sequential data. This eventually provides 
an efficient and flexible sequential data retrieval. Numerical 
experiments on synthetic and real data demonstrate the efficiency 
and accuracy of the proposed DI-HMM. 
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I. Introduction 

In the context of social science research, the analysis of 
sequential event data has been studied extensively. These 
studies have explored broad technical areas that range from 
biological data analysis to speech recognition, image clas¬ 
sification, human behavior recognition, and time-series data 
analysis. Esmaeili et al. (U categorize three types of sequential 
pattern after theoretical investigation for large amounts of data. 
Lewis et al. E propose a sequential algorithm using special 
queries to train text classifiers. Song et al. m propose a 
sequential clustering algorithm for gene data. More recently, 
the studies using sensor data analysis for human behavior 
recognition and video data understanding have received signif¬ 
icant attention because of the significant progress on wearable 
devices mmm. Those devices enable users to record all 
of their experiences such as what is viewed, what is heard, 
and what is noticed. Nevertheless, although collecting all 
observed data has become much easier, it remains difficult 
to immediately find the data that we want to access because 
the amount of time series data is extremely huge. In case of 
life log data application, for example, it must be much easy to 
exactly retrieve information of particular places or dates if rich 
and comprehensive meta-data are sufficiently attached to every 
piece of datum to be identified. However, if a query is very 
ambiguous like retrieving a situation similar to the current 
situation , it must be surely challenging to obtain meaningful 
results at the end. Thus, finding such similar sequential pat¬ 
terns from vast sequential data using a target pattern extracted 
from the current situation is of crucial importance. This is of 
interest in the present paper. 
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Finding similar sequential patterns needs to discriminate 
particular sequential patterns from many partial groups of 
patterns. There are some useful methods for detecting similar 
partial patterns from sequential data. One traditional but rep¬ 
resentative method is Dynamic Programming (DP) matching 
algorithm that is typically used for speech and natural language 
processing. However, because practical sequential data always 
contain time misalignments of events, the sequential pattern 
detector must support a naive extraction mechanism, namely 
the algorithm needs to extract the sequential patterns that have 
with not only the precisely same duration and same interval 
of events, but also those with the slightly different length of 
duration and/or the slightly different interval. An alternative 
modeling category is a statistical model of which representa¬ 
tives are Support Vector Machine (SVM) and Hidden Markov 
Model (HMM). SVM is introduced in the area of statistical 
learning and is used for nonlinear classifications such as image 
classification. Although SVM is powerful to classify data based 
on the similarity of each feature, it is not specialized for 
sequential data. Meanwhile, HMM is specialized to deal with 
sequential data by exploiting transition probability between 
states, i.e., events. Consequently, we dedicate solely to HMM 
and its extended methods in this study. 

The primary contributions of our work are two-fold: (a) 
we advocate that the support of both “state duration” and 
“state interval” is of great significance to represent practical 
sequential data based on an analysis about the feature and 
structure of sequential data, then extracts requirements for its 
modeling. Next, (b) we propose a new sequential model by 
extending Hidden semi-Markov Model (HSMM) to support 
both of them efficiently. Regarding (a), We especially address 
the generalization of the requirements, and emphasize the 
importance of handling event order , continuous “duration” 
of an event , and discontinuous “interval” time between two 
events. Hereinafter, we define the event continuous duration as 
“state duration”, and define the discontinuous interval with no 
observation as “state interval”, respectively because an event 
is treated as a state in HMM. Then, with respect to (b), after 
assessment of the extended HMM methods in the literature 
against the requirements, we show that all the existing models 
cannot treat both the state duration and the state interval 
simultaneously. Nevertheless, we also show that HSMM, one 
of the extended HMM methods, can handle the state duration, 
and is an appropriate baseline to be extended to meet all 
the demands. Subsequently, in the present paper, we propose 
an extended model of HSMM that accommodates the state 


duration as well as the state interval. 


The rest of the paper is organized as follows. The next 
section introduces the related work of sequential data analysis. 
Section [HI] analyzes the model requirement, and assesses 
adequacy of extended HMM methods against the requirement. 
Section [V] explains the proposed method, Duration and Interval 
HMM (DI-HMM), and Section [VT| performs numerical evalu¬ 
ations. Finally, Section |VII| presents a summary of this paper 
and describes future work. 


II. Related Work 

This section presents an explanation of related work for 
sequential data analysis. For sequential pattern matching and 
detection, DP matching ID extracts similar sequential patterns 
from different two sequential patterns. It was firstly used 
for acoustic speech recognition, but has been now widely 
applied to various fields such as biological sequences of DNA 
sequences. For sequential pattern classification, SVM ID and 
Probabilistic Graphical Models (9) are proposed. SVM is one 
of the classification algorithms which produces a model using 
feature attributes extracted from training data, and calculates 
the distance between the training data and test data using these 
attributes. More recently, however, SVM has been extended to 
handle sequential data classification such as speech recognition 
and handwriting recognition. Shimodaira et al propose an 
extended SVM which enables frame-synchronous recognition 
of sequential pattern (ED- Probabilistic Graphical Model with 
directed graph or undirected graph is the typical model to 
represent sequential patterns. Safari proposes a new model of 
Deep Learning for sequential pattern recognition lITTIl . Lastly, 
HMM iflUlHD is a statistical tool for modeling sequence of 
observations. HMM is regarded as one kind of probabilistic 
Graphical Model. While HMM is used for many applications, 
for example, speech recognition, handwriting recognition and 
activity recognition, most of the extensions of HMM are 
proposed specialized for individual application data. 

Considering extraction of not only the exactly same se¬ 
quential pattern but also the similar sequential pattern of 
time duration and time interval of events, a statistical-based 
representational capability is preferably required. To this point 
of view, although pattern matching algorithms like DP match¬ 
ing concern sequential oder of events , they do not address 
finding such similar sequential patterns due to the lack of 
statistical modeling capability. Furthermore, because most of 
statistical modeling algorithms such as SVM mainly consider 
feature similarities, they do not also directly take into account 
the structure of sequential data. Hence, they do not handle 
temporal order and ambiguity of events to find such similar 
sequential patterns efficiently. On the other hand, HMM, a 
statistical model, is powerful to treat the sequential patterns by 
exploiting probability of event orders by transition probability 
between states. Therefore, we conclude that HMM is the most 
suitable model to treat the sequential data, and has potential 
capability to describe temporal ambiguity to find the similar 
sequential patterns. As a result, we particularly examine HMM 
hereafter in this paper. 

Many extended HMM methods have been proposed for 
respective application data. Some of the extended HMM is 
specialized for sequential data. Xue et al. propose transition- 


emitting HMMs (TE-HMMs) and state-emitting HMMs (SE- 
HMMs) for treating discontinuous symbols ifPfll . Their studies 
are for an off-line handwriting word recognition, and the 
observation data include discontinuous and continuous parts 
between characters when writing cursive letters. They address 
such discontinuous and continuous features, and extend HMM 
to treat both of them. Bengio et al propose IO-HMM for 
gesture recognition that maps input sequences to output se¬ 
quences during learning whereas the original HMM learns only 
output sequence distributions E3- IO-HMM supports a new 
function of maximum likelihood and parameters for calculating 
the maximum likelihood extracted from training data which 
is the pair of input/output sequences. IO-HMM is a hybrid 
model of generative and discriminative models to treat output 
probability estimation for both of input sequences and observa¬ 
tions. Salzenstein et al deal with a statistical model based on 
Fuzzy Markov random chains for image segmentation in the 
context of stationary and non-stationary data na. The model 
handles multispectral data and estimation of hyperparameters 
in non-stationary context. Yu et al propose Explicit-Duration 
Hidden Markov Model CD Addressing state interval between 
state transition, a new forward-backward algorithm to estimate 
model parameters is proposed. The model treats the difference 
of state durations in all the states. Beal et al propose HMM- 
selftrans that is an extended model of EDM [18|. Furthermore, 
Yu et al CD and Murphy et al [20] propose HSMM which 
is a basic model of EDM. Their new model treats the state 
duration and the number of observations being produced while 
staying in the state. HSMM is applicable to many applications 
such as handwriting recognition, human behavior recognition, 
and other time series data application estimation l2l l [22] 1 231 . 
Hence, these application data are especially sequential. Al¬ 
though many extended HMM methods exist, they lack some 
capabilities to efficiently handle sequential data as the next 
section explains, more specifically, the capability to handle 
both the state duration and the state interval between events. 

III. Sequential Data Analysis 

This section presents a description of the model require¬ 
ments for sequential data analysis, and presents comparison of 
the requirements and the satisfaction of each extended HMM. 

A. Notations 

The symbols and the marks are defined a priori in this 
section. First, taking a certain time as £, we consider the se¬ 
quence of which period is 1 < t < T. The observation at time 
t is represented as o t , and the observation sequence starting at 
time t = t\ to t = £2 is represented as o ti:t2 = o tl , * • •, o t2 . 
The length of o ti:t2 , i.e., |o ti: t 2 |, is t 2 — ti + 1- Then, the target 
observation sequence, which has to be assigned a meaningful 
label to, starting at time t= 1 and ending at t = T is represented 
as oi;T = and the set of observable values is 

V = {ui, •••,%}. Next, an elemental state is denoted as s e / m , 
and each state s e i m has a different length of time units defined 
as d e im. In addition, each has one hidden state that belongs 
to the set of hidden states denoted as S = 
where the size of the set, i.e., |S|, is M. Furthermore, s e / m 
is denoted alternatively by using its starting time t\ and its 
ending time £ 2 , as s ti:t2 . In this case, the time length of s ti:t2 
is equals to £2 — £1 + 1. On the other hand, a state sequence is 



generated from multiple elemental states. This state sequence 
is represented as St i: t 2 in a bold font by specifying its starting 
time t\ and its ending time £ 2 - The n-th constitutional state in 
st i: t 2 is denoted as s n of which duration is d n . For instance, 
the first state is represented as 81 ; the duration of 81 is d\. 
In particular, the state sequence corresponding to the entire 
period to be considered, namely from time t *= 1 to t = T, is 
denoted as Si= si, • • •, sjv> where N is the total number of 
Si. It should be noted that the definition of the state sequence 
is similar to that of the observation sequence, but the number 
of constitutional states, i.e., |si : t|, is not equals to T because 
each state may have a different time length as explained above. 

B. Requirement for Model Description 

This section presents discussion of the requirements for the 
sequential data model by using time-series data: representative 
data of sequential data, as shown in Figure [T] Suppose that two 
different sequences where successive states, i.e., events, are 
observed from two different sensors. A state is represented as 
a block of which width represents its continuous state duration. 
Furthermore, those events are not continuously observed, that 
is, a discontinuous interval time between two observed states 
may exist. Thus, the length of this unobserved period is 
represented by distance between two successive blocks. On 
the other hand, the gray-colored state in Figure [I] illustrates 
the extracted states sequence, si : t, that forms one group. The 
group of state sequence, Si : t, in this figure consists of four 
states: 81 , 82 , 83 , and 84 . 


s 1 s 2 s 4 



Fig. 1. Sequential data analysis. 

Taking into account of the example above, we now inves¬ 
tigate the requirements for the sequential data model. First, it 
goes without saying that the states in Figure [T] are observed 
in a prescribed order. This must be described in the model 
(Rl). Next, in some cases, several states can be observed in 
a partially overlapped manner as 82 and 83 . In other words, 
multiple states might occur simultaneously at a certain period. 
Therefore, such overlapped multiple states must be represented 
in the model (R2). Furthermore, the time lengths of respective 
states mutually differ. This requires the model to express a 
‘state duration’ in a model (R3). Finally, for the case in 
which two states do not occur in a series without time gap, 
a vacant time between one state and another state that is not 
involved in the group of sequence might exist between two 
states. Additionally, the time length of this vacant time can be 
variable. Therefore, the ‘state interval’ between two states in 
the model must be described (R4). Consequently, we conclude 
the sequential data model needs to describe the items below. 

Rl State order 


R2 Staying multiple states in a certain time 
R3 State duration 

R4 State interval 

It should be noted that R2 has a different characteristic 
from other items because Rl, R3, and R4 are the requirements 
for a single sequence whereas R2 is particular for multiple 
sequences. Therefore, we specially examine requirements Rl, 
R3, and R4 in this study. The examination of R2 shall be left 
for advanced research to be undertaken in future work. 

C. Requirement Assessment for Extended HMM Methods 

This section assesses whether HMM and extended HMM 
methods meet the requirements analyzed in the previous 
section. Table [I] presents a comparison of the conventional 
HMM methods from the viewpoints of the model requirements. 
The basic HMM represents the order of the states, and all 
the extended HMM methods inherit this capability. IO-HMM 
treats the state duration but the target of estimation is different. 
Therefore, it does not satisfy the model requirements. HSMM 
is proposed to model the remaining time length to stay in 
the same state. In addition, HMM-selftrans and EDM are the 
extended models of HSMM. These methods satisfy the same 
requirements: the state order and the state duration, but do not 
support the state interval. As a result of requirement assess¬ 
ment, we find that no model can accommodate both the state 
duration and the state interval simultaneously. Nevertheless, 
HSMM expresses the state duration, and this capability is not 
supported by any other methods. Therefore, we conclude that 
HSMM is the best starting model for extension to our new 
model. The next subsection explains the general HSMM. 


TABLE I. Requirement assessment of each method. 



Requirements 

Method 

State duration 

State interval 

HMM |13J| 



IO-HMM jl5l 



HSMM I19H20I 

/ 


HMM-selftrans Il4l 

/ 


EDM fl'7l 

/ 


DI-HMM (Proposal) 

/ 

/ 


IV. Hidden semi-Markov Model (HSMM) 

HSMM is an extended model of conducting HMM using a 
semi-Markov chain with a variable staying duration for each 
state El. The crucial difference between HMM and HSMM 
is the number of observations per state. HSMM treats the 
duration of staying at one state by introducing an additional 
parameter specialized for describing the state duration when 
calculating the transition and emission probabilities. Figure [2] 
illustrates the concept of HSMM to handle the state duration in 
each state when 81 , 82 , and s n are the super state nodes. Sup¬ 
pose that the state duration of 81 is di, and {cq, 02 , • • •, o<q} 
is the set of emitted observations from 81 during d\. After 
the state duration time d\ is expired, 81 is transmitted to the 
next state 82 . Thus, HSMM handles the state duration time in 
each state s n by d n . 

Here, let S m t and S m be hidden states, and D m / and D m 
be the lengths of time spent in states S m > and S m , respectively. 

























Therefore, S m ' and S m may happen more than once in a 
sequence, but we use the same D m ' and D m regardless of 
the number of occurrences in the following equations. The 
number of observations from the state S rn is determined by 
the the duration D m . Thus, the time length of the observation 
sequence is calculated as T = E^ =1 d n . Then, the transition 
probability from S m r with duration D rn / to S m with duration 
D m is represented as U(s m/ ,D m ,)(s m ,D m )- This is defined as 

a(Sf m /,D m ,)(5 m ,D m ) - P\St:t+Dm-l = Sm\St- Drn/: t-l = Sm']. (1) 

The emission probability bs rn ,D rn (o t :t+D m -i) is denoted as 

— P\&t:t-\-D m — l \ ^t:t-\-D m — l = S m \- (2) 

Finally, the set of HSMM parameters, A, is defined as 

^ — { a (S rn ,,D m ,)(S rn ,D m ),bs m ,D m (v kl :k Dm ),'KS rn ,,D m }, (3) 

where 'Ks rn ,,D rn is the initial distribution of the state S m ', and 
v fei:fe Dm represents the sequence of the observable values of 
size D m . This is v^, • • •, Vk D £ V x • • • x V, where Vk n 
is a n-th observable value, and k n (< K) is the corresponding 
index in V. For the estimation of the likelihood probability, we 
use an extended Viterbi algorithm 124) because it is the most 
popular algorithm for estimating the maximum likelihood. The 
forward variable in the algorithm is defined as 

fit (^m; Dm) — max P[si;t-D m j St-D m + l:t — S m •> ^l:D m | A] 

■S l:i —d 

= max D, m , Drn' ) 

5 m ,G5\{5 ro },£) m / L 

a {Sm',Dm')(Sm,D rn ) ’ ,D m (o t -D m :t-l},(4) 

where, St(S m , D m ) is the maximum likelihood that the partial 
state sequence ends at t in the state S m of duration D m . 


To tackle this problem, we extend the conventional HSMM 
by newly introducing state interval probability to each transi¬ 
tion probability between two states. This present paper calls 
this new extended model as Duration and Interval Hidden 
Markov Model (DI-HMM). The concept of DI-HMM is illus¬ 
trated in Figure [3] Although the structure of DI-HMM is sim¬ 
ilar to HSMM described in Figure [2j state interval probability 
is newly added to HMM as L m ',m as illustrated in the figure. 
While the start time of S m is the next to the end time of S m / in 
HSMM, Sm starts after the L m / ?m length of time passes. The 
time length of the observation sequence; T varies due to its 
dependency on the length of the durations and the intervals, 
leading to T = Ti^ = 1 (d n + l n -i,n)- Then, the state interval 
is described by the state interval probability. Exploiting the 
state interval probability, the proposed model handles the state 
interval in the extended HSMM. The proposed method uses 
DI-HMM for training a model, and Viterbi algorithm is used 
for recognition of a test data. 
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Fig. 3. Concept of DI-HMM using interval probability. 



V. Duration and Interval HMM (DI-HMM) 

A. Motivation 

HSMM introduced in the previous section handles the state 
duration. Thus, HSMM is often called an explicit duration 
model. However, because S m f moves to S m at the expiration of 
duration time D m ', HSMM cannot describe the state interval 
between two states as it is. The easiest way to improve HSMM 
to handle both the state duration and the state interval is to 
introduce special state, called interval state , describing the time 
interval between two states. However, it is well known that 
introducing such an interval state between two states degrades 
the accuracy of discrimination l25ll . 


B. Training Sequential Data Model 

The details of DI-HMM is elaborated using example data 
as shown in Figure]?] The slash line patterned blocks represent 
the data sequence of training set; / n -i,n is the time difference 
between the end of s n _i and the beginning of s n . Furthermore, 
Lm',m, which is not represented in Figure]?] represents the 
time difference between the end of S rn / and the beginning of 
S m when the next state of S m f is Sm- 


First, the time interval probability density distribution is 
expressed by adopting the Gaussian distribution, p(L m f ,m)> as 


P{Pm' ,m) 


1 (*-m ) 2 

, e 2 <r 2 

VZttct 2 


(5) 


where a donates the variance of state interval L m ',m , and p 
is the mean of Then, the set of parameters used in 

DI-HMM is defined as 


ns m ,,D m ,p{L m ’, m )}. ( 6 ) 


The transition and emission probabilities are defined as 
being equal to HSMM. The difference between HSMM and 
DI-HMM is to consider the parameter of p(L m ',m)- The 
reason why the Gaussian distribution is adopted as the interval 
distribution is that it simply expresses the density distribution, 























and the parameters are not required to change for each proba¬ 
bility. But, other distributions and functions for our proposed 
algorithm could be adopted. 

The range of x might influence either memory consumption 
and/or computational complexity to generate the model. There 
might be no x value suitable for the observation values due to 
the range limitation of x if p(Lm',m) is generated in a training 
period. However, if the parameter p(L m ^ m ) is generated every 
time an observation is fed to the algorithm, the calculation 
cost can be much higher. Our motivation to introduce the 
interval distribution to HSMM is, as explained earlier, to find 
the similar part of sequential data including the interval and 
also to discriminate between the target part and the similar 
part. Therefore, even if the probability of L m ',m is presumed 
to zero around the skirts of the distribution, no particular 
problem arises. Consequently, we introduce the boundary of 
the probability value 0 pt to determine the edge of the skirt 
of p(L m / ?m ). On generating the p(L m / ?m ), the calculation is 
terminated when the probability value goes less than Q pt . 
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Fig. 4. Sequential data and representations. 


C. Probability Estimation for Recognition 

The Viterbi algorithm is used to estimate the probability of 
a model as lf26l . Then, the pair of the model and the probability 
described are stored as the candidate, and finally the maximum 
likelihood estimate is calculated for each state for each model. 

First, we calculate the probability of L m / ?m , i.e., 
assigning L m / ?m to the new parameter distribution p(L m ',m) 
beforehand. If L m ',m is out of the range for the the 

time interval probability is determined as 

p(L m >, m ) = min p(L m i m ) x c. (7) 

l<m' <M,l<m<M 

Therein, c is 0 < c < 1. Then, the forward variable for 
estimating the maximum likelihood is calculated as 

— max P\s\ : t—D , —Sqm Oi:_Z} |A] 

= max {St-D m (Sm',D m >) 

s m ,e5\{5 ro },D m , m 

■ bs m ,,D m ,{Ot- Dm , + l:t) 

( 8 ) 

The parameter of is the same as the one intro¬ 

duced into Eq. 0 in Section [Tv] The state interval probability 
is calculated at the same time as calculating the parameter of 
the likelihood using the transition probability recursively. 

The difference between HSMM and DI-HMM is the capa¬ 
bility of handling the time interval between states as explained 
earlier. The interval probability in DI-HMM is integrated 
for introducing the time interval between states to calculate 


the likelihood. This calculation might cause additional cal¬ 
culation cost. Hence, it is necessary to evaluate the model 
for calculation cost. In addition, the observation distributions 
bs rn ,D rn (oi:D rn ) can be parametric or non-parametric. In this 
proposal, the relation of the state duration and the state interval 
is not represented in a model. For that reason, (oi : r> m ) 

is handled as non-parametric, discrete, and independent of the 
state durations. Then, p(L m ^ m ) is also discrete and indepen¬ 
dent of the state duration and the transition probability. 

VI. Numerical Experiments 

This section presents comparisons of the following items 
between DI-HMM and HSMM: section A describes the dis¬ 
crimination performance, section B describes the recognition 
performance, and section C describes the calculation time of 
training and recognition of DI-HMM. Figure [5] portrays an 
example of how to generate synthetic data for the evaluation. 
First, the number of states N , the minimum value of the 
duration d m * n , the maximum value of duration d m axi the 
minimum value of the state interval l m i n and the max value 
of the state interval lmax are given as initial parameters. The 
length of sequence T is also given in the evaluation. In the 
example of Figure [5] the number of states and durations are 
N , and the number of intervals is N — 1. The lengths of each 
duration and each state interval are incremented one by one. 
Then, all those lengths are combined with round robin. 



Fig. 5. Sequential data generation. N is given as the number of states. 


A. Discrimination Performance 

First, we generate 200 different sequences by fixing T = 

14, N {3,4}-, d m in 1? dmax 10? Imin 1? 

and Imax = 4. Figure [6] shows the generated example data. 
Each row represents one sequence of data. The gray blocks 
are observed states, and the length of the states represents 
the duration. To evaluate the discrimination performance, we 
compare the likelihoods calculated using Eq. 0 for the test 
data against each training datum. Discrimination means that the 
likelihood for each training data is different from one another. 

Figure [7] exhibits the results of Data 5, 10, and 15, extracted 
by the sequential data presented in Figure [6] The x-axis shows 
each training datum, and the y- axis shows the likelihood for 
each test datum. From this figure, the likelihoods of HSMM 
indicate around 0.5 for each training datum; Data 5, 10, and 

15. This means that all the test data have similar data model, 
and they cannot be discriminated by HSMM. On the other 
hand, in DI-HMM case, each result has a peak value at the 
corresponding training data. This means that DI-HMM can 
discriminate those data, and DI-HMM can discriminate the 
differences of both the state duration and the state interval. 
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Fig. 6. Example data sequences from Data 1 through Data 20. 




Fig. 7. Likelihood for respective training data. 


Figure [5] shows the discrimination performance against 
each different dataset when changing the number of training 
data from 1 to 6. The x-axis shows the number of training data; 
the y -axis shows the Error Rate of Discrimination ( ERD ). 
From the result, DI-HMM discriminates between the entire 
sequences by checking the difference of the state interval 
length for the most part. Even if the amount of training data 
is only one, the performance is sufficiently high because the 
ERD is close to 0.1 while the ERD of HSMM is around 0.6 
constantly. Therefore, DI-HMM has powerful discriminative 
capability to recognize differences among sequences. 

B. Recognition Performance 

The results of the discrimination performance show that 
the likelihood of DI-HMM can give the maximum value at 
the data that has the same duration and the same state interval 
in all training data. However, in a practical field, robustness 
to a slight time delay or time extension of the same labeled 
data is required. For instance, in case of music performance, 
the consecutive time lengths of one sound must be slightly 
different when the same rhythm is played by two different 
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Fig. 8. Number of training data vs. ERD. 


instruments or played by two players. Therefore, we evaluate 
the recognition performance when individual sound has such 
a time delay and/or a different consecutive time. 
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Fig. 9. Part of an example rhythm score and the waveform. 


To evaluate the impact caused by the time difference in 
the data assigned to the same label, we use music sound data 
played by different instruments. Figure [9] presents an example 
of the rhythm scores. The two waveforms are the monophonic 
waveform of sound played by an organ and a drum. The 
waveform of the organ has longer length of notes than the 
drum has. In this experiment, the music data for the evaluation 
are generated using the following steps: 

a Divide an input waveform into bars, which are small 
pieces of music containing a fixed number of beats. 

b The length of the period when sounds exist corre¬ 
sponds to the state duration time, whereas the length 
of the period when no sound exists corresponds to the 
state interval time. 

c The observation sequence consists of the sound “on” 
symbol and “off’ symbol. 

For this experiment, we use the bass part of “We Wish 
You A Merry Christmas” that consists of 57 bars. First, we 
prepare six kinds of music sound which are played by different 
instruments with different lengths of notes. Table II shows each 
of differences in the experimental data set. In the experiment, 
Data 1, 2, and 3 are used as training data; and Data 4, Data 
5, and Data 6 are used as test data. 

For the training phase, the first bars of Data 1, 2, and 
3 are trained for the model labeled as “1”. For the recog- 





























TABLE II. 


Generated Music Data. 


i 


Index 

Instrument 

Minimum Fength of Note 

Data 1 

Grand Piano 

3/4 length of crotchet 

Data 2 

Grand Piano 

1/2 length of crotchet 

Data 3 

Puncy Grand Piano 

1/4 length of crotchet 

Data 4 

Electric Piano 

1/4 length of crotchet 

Data 5 

Drum 

1/4 length of crotchet 

Data 6 

Organ 

1/2 length of crotchet 


nition phase, the probability of each model is calculated. 
The recognition result (the estimated label) is obtained from 
the label of the model with the maximum probability. We 
evaluate the recognition accuracy based on /-measure that is 
calculated by 2 • recall • precision/(recall + precision), where 
precision = TP/PP , and recall = TP/AP. Here, when 
the Predicted Positive (PP) is the number of models whose 
likelihood calculated using Eq. 0 or Eq. ([8} is the maximum 
in all models, True Positive (TP) is the number of collected 
models in PP; and Actually Positive (AP) is the number 
of labeled models. We prepare and evaluate two patterns of 
data by changing the length of bars. The observation sequence 
in the first pattern consists of a sound pattern of one bar. 
Whereas each observation sequence consists of a sound pattern 
combining two continuous bars in the second pattern. In the 
first case, we obtain 57 labeled models, and extract 40 bars of 
which their rhythms are different. Thus, these 40 bars finally 
are used for the experiment. At the training phase, 40 models 
are generated by all 120 (= 40 bars x 3 data) training data. 
Then, at the recognition phase, another 120 data are tested. 
Similarly, as for the second case, the number of the labeled 
models is 26, and the number of training data is 78 (= 26 bars 
x 3 data). The number of test data is 78. 

Figure [T0| and Figure [IT] show the recognition results in the 
first and second patterns, where red bar graphs show the results 
of DI-HMM, and blue bar graphs show those of HSMM. From 
these figures, all scores of precision, recall, and /-measure 
of DI-HMM indicate higher values than those of HSMM. 
Therefore, our proposed model is effective for recognition 
of the rhythm pattern of music by taking into account vari¬ 
ous instruments. Furtheremore, comparing the results between 
Figure [TO] and Figure |TT] the recognition accuracy for the 
data of which sequence consists of two bars is worse than 
that for the data of which each sequence consists of one bar. 
This result can be explained as follows; the various lengths 
of durations and intervals are trained for the same duration 
and interval probability between the same two states when the 
length of the sequence is longer and when the same symbol 
appears many times in a sequence. However, the symbols of 
the observation sequence differ in practical data like music 
data or some life event data. Hence, the quantities of symbols 
indicate various different values. Consequently, when various 
symbols are included in a sequence, the recognition accuracy 
will get increased even if the sequence is longer. 

C. Calculation time of Training and Recognition 

For calculation time evaluation, we generate 35 sequences, 
fixing dmax 2, Imin = 1? ^ max = 10? 3nd T 

is not fixed a priori. Using the generated data, we compare 
training time and recognition time while changing the number 
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Fig. 10. Recognition accuracy when the sequence consists of one bar. 



precision recall f-measure 


Fig. 11. Recognition accuracy when the sequence consists of two bars. 


of training data. The results of training time and recognition 
time are shown in Figure [12] and Figure [13] respectively. The 
x-axis shows the number of training data, and the y -axis shows 
the calculation time for training/recognition. The red line is the 
result of DI-HMM, and the blue line is the result of HSMM. 
The slopes of the recognition time and the training time in 
DI-HMM are steeper than those of HSMM. Therefore, the 
introduction of the interval probability onto HSMM is expected 
to pose additional calculation cost. However, it does not 
severely affect the total amount of calculation. Meanwhile, as 
shown in the previous sections, the recognition performances 
and the discrimination performances of DI-HMM give superior 
results than HSMM does. Therefore, we conclude that our 
proposed DI-HMM is very effective for the sequential data 
analysis that is originally motivated in this paper. 

VII. Summary and Future Work 

As described herein, we investigated the requirements for 
sequential data analysis by focusing on the structure and 
feature of sequential data. Then, we have proposed Duration 
and Interval HMM (DI-HMM) by introducing the interval 
probability onto HSMM in order to handle both the state 
duration and the state interval. We evaluated the discrimination 
performance, recognition performance, and measured the cal¬ 
culation time for training and recognition by the computational 
simulation. For the evaluation of discrimination performance, 
DI-HMM can discriminate between different sequences with 
fewer training data. The error rate of discrimination is less than 




































Fig. 12. Training time with the training dataset size. 



The number of test data 


Fig. 13. Recognition time with test dataset size. 


0.1 if we train more than two sequences selectively. Therefore, 
DI-HMM is powerful to find even the sequence that is not 
included in the training data. This feature allows us to easily 
add new labels into existing databases of the training data. 
Furthermore, the evaluation results obtained using the sound 
data show that DI-HMM gives higher performances for rhythm 
pattern recognition than HSMM does by taking into account 
the slight time delay. Therefore, we can say that DI-HMM 
supports temporal order as well as temporal ambiguity of 
events to find similar sequential patterns efficiently. However, 
from the evaluation of the calculation time, the proposed 
method requires additional time to treat the interval. This 
revealed the fact that the more additional time might be needed 
when the number of training data increases. Future studies 
will be conducted to compare our proposed method with a 
further new different method which introduces the interval 
state node to HSMM, and to evaluate the training-recognition 
time and memory consumption. Additionally, we shall improve 
DI-HMM to reduce such calculation costs to facilitate its 
application as an online system. 
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